Skip to main content

Data Representation

Subject: Computer Science
Topic: 4
Cambridge Code: 0478


Number Systems

Binary (Base 2)

Binary - Base 2 numbering system (digits 0-1)

Position values: 27,26,25,24,23,22,21,202^7, 2^6, 2^5, 2^4, 2^3, 2^2, 2^1, 2^0 128,64,32,16,8,4,2,1128, 64, 32, 16, 8, 4, 2, 1

Example: 10110₂ = 1(16) + 0(8) + 1(4) + 1(2) + 0(1) = 22₁₀

Converting to binary:

  • Divide by 2 repeatedly
  • Remainders give binary digits
  • Read from bottom to top

Hexadecimal (Base 16)

Hexadecimal - Base 16 (digits 0-9, A-F)

Digits: 0,1,2,3,4,5,6,7,8,9,A(10),B(11),C(12),D(13),E(14),F(15)

Position values: 163,162,161,16016^3, 16^2, 16^1, 16^0 4096,256,16,14096, 256, 16, 1

Example: 2F₁₆ = 2(16) + 15(1) = 47₁₀

Advantages:

  • Compact representation
  • Easy conversion to binary
  • Used for memory addresses, colors

Converting Between Systems

Binary ↔ Hexadecimal:

  • 1 hex digit = 4 binary digits
  • Group binary in fours
  • Convert each group

Example: 10110011₂ = B3₁₆

  • 1011₂ = B₁₆
  • 0011₂ = 3₁₆

Data Units

Bit: Single binary digit (0 or 1)

Byte: 8 bits

Size Conversions

UnitSize
Kilobyte (KB)1,024 bytes
Megabyte (MB)1,024 KB
Gigabyte (GB)1,024 MB
Terabyte (TB)1,024 GB

Note: Often abbreviated as 1024 ≈ 1000 in casual usage

Calculating Storage

Example: How many bits in 5 MB?

5 MB × 1024 KB/MB × 1024 bytes/KB × 8 bits/byte = 41,943,040 bits


Character Encoding

ASCII (American Standard Code for Information Exchange)

ASCII - 7-bit encoding (128 characters)

Ranges:

  • 0-31: Control characters
  • 32-47: Spaces, punctuation
  • 48-57: Digits 0-9
  • 65-90: Uppercase A-Z
  • 97-122: Lowercase a-z

Example:

  • 'A' = 65 = 01000001₂
  • '0' = 48 = 00110000₂

Extended ASCII

Extended ASCII - 8-bit encoding (256 characters)

  • Includes accented characters
  • Special symbols
  • Scientific characters

Unicode

Unicode - Universal character set

UTF-8: Variable-length (1-4 bytes)

  • ASCII compatible
  • Most common on web

UTF-16: Fixed 2-4 bytes

  • Used in many applications

Advantages:

  • Supports all languages
  • Emojis and special characters
  • Global compatibility

Image Representation

Bitmap (Raster) Images

Bitmap - Grid of colored pixels

Color representation:

  • RGB: Red, Green, Blue (each 0-255)
  • Example: 255,0,0 = Pure red

Color depth:

  • 8-bit: 256 colors
  • 16-bit: 65,536 colors
  • 24-bit: 16.7 million colors

File size calculation: Size=width×height×color depth\text{Size} = \text{width} × \text{height} × \text{color depth}

Example: 100×100 pixels, 24-bit Size = 100 × 100 × 24 bits = 240,000 bits ≈ 30 KB

Vector Images

Vector - Mathematical descriptions of shapes

Advantages:

  • Scalable without quality loss
  • Smaller file sizes (simple shapes)
  • Resolution independent

Disadvantages:

  • Complex images not suitable
  • Less photorealistic

Image Compression

Lossy compression:

  • Removes data
  • Smaller file size
  • Quality degradation
  • JPEG, MP4

Lossless compression:

  • No data removal
  • Larger file size
  • Perfect restoration
  • PNG, GIF, ZIP

Sound Representation

Sound Digitization

Sampling - Recording sound at intervals

Sampling rate: How often sound sampled

  • CD quality: 44.1 kHz
  • Professional: 48 kHz
  • Telephony: 8 kHz
  • Higher rate = better quality

Sample resolution (Bit depth):

  • 8-bit: 256 volume levels
  • 16-bit: 65,536 volume levels
  • 24-bit: 16.7 million levels
  • Higher = better quality

File size calculation: Size=Sampling rate×Duration (s)×Bit depth\text{Size} = \text{Sampling rate} × \text{Duration (s)} × \text{Bit depth}

Example: 44.1 kHz, 16-bit, 3 minutes Size = 44,100 × 180 × 16 = 127,008,000 bits ≈ 15.9 MB

Sound Compression

Lossy (MP3, AAC):

  • Removes inaudible frequencies
  • 10:1 compression ratio typical
  • Acceptable quality loss

Lossless (FLAC, WAV):

  • Preserves all data
  • Larger files
  • Perfect reproduction

Text Compression

Run-Length Encoding (RLE)

RLE - Replace repeated characters with count + character

Example: AAABBCDDD → 3A2B1C3D

Efficiency depending on data - Very effective for repetitive data

Dictionary Compression

Lempel-Ziv-Welch (LZW):

  • Replaces repeating sequences with codes
  • Adaptive dictionary
  • ZIP files use this

Error Detection and Correction

Parity Bit

Parity - Extra bit for error detection

Even parity: Total 1s (including parity) = even Odd parity: Total 1s = odd

Example (even): 1011010 → 10110101 (add 1)

Checksum

Checksum - Sum of data bits modulo some value

  • Added to end of data
  • Receiver verifies by recalculating
  • Detects transmission errors

Error Correcting Codes

Hamming code:

  • Detects and corrects single-bit errors
  • Multiple parity bits at specific positions

Key Points

  1. Binary: Base 2 (0-1)
  2. Hexadecimal: Base 16 (0-9, A-F)
  3. Data units: Bit, byte, KB, MB, GB, TB
  4. ASCII: 7-bit, 128 characters
  5. Unicode: Supports all languages
  6. Bitmap: Pixel-based, color depth matters
  7. Vector: Math-based, scalable
  8. Lossy compression removes data
  9. Lossless compression preserves all data

Practice Questions

  1. Convert binary ↔ decimal ↔ hexadecimal
  2. Calculate file sizes
  3. Explain character encodings
  4. Compare bitmap vs vector
  5. Calculate image/sound file sizes
  6. Apply RLE compression
  7. Detect parity errors

Revision Tips

  • Practice number conversions
  • Know data unit relationships
  • Understand ASCII/Unicode
  • Know color depth effects
  • Understand sampling rate importance
  • Compare compression types
  • Calculate file sizes accurately